# DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice

> Purpose: This README provides all the information needed for reviewers to reproduce the pipeline reported in our manuscript. **Please note that this repository is intended strictly for code review purposes. Unauthorized use for any other purposes, including commercial use, is prohibited without prior consent from the authors.**

---

## 1. Repository Structure

This repository is structured to facilitate easy navigation and understanding of the codebase. Below is a brief overview of the main directories and files:
```text
├── answer/                             # (optional) generated answers after inference
├── checkpoints/                        # (optional) model checkpoints
├── data/                               # examples for source data
├── packages/                           # third-party packages used for training
├── tokenized_data/                     # (optional) tokenized data for training
├── train_script/                       # training scripts
│   ├── ds_z2_config.json                   # deepspeed ZeRO-2 configuration file
│   ├── ds_z3_config.json                   # deepspeed ZeRO-3 configuration file
│   ├── qwen2vl_full_sft_stage1.yaml        # training configuration file for the 1st stage
│   ├── qwen2vl_full_sft_stage2.yaml        # training configuration file for the 2nd stage
├── get_score.py                        # calculate final accuracy/hit-rate and IoU scores
├── inference.py                        # inference code for generating answers
├── requirement.txt                     # required packages for environment setup
└── run.sh                              # shell script to run the training and inference pipeline
```

## 2. Environment Setup

To set up the environment, you need to install LLaMA-Factory first and then use the provided `requirement.txt` file. It is recommended to create a virtual environment before installing the dependencies.

```bash
conda create -n dentvlm python=3.10.16
conda activate dentvlm

cd packages/LLaMA-Factory
pip install -e ".[torch,metrics]"

cd ../..
pip install -r requirement.txt

# MAX_JOBS can be adjusted based on your system's capabilities
MAX_JOBS=32 pip install flash-attn==2.6.3 --no-build-isolation
```

## 3. Reproducing the Pipeline
To reproduce the pipeline, you can run the provided `run.sh` script. This script will execute the training and inference steps sequentially. Note that you should check the configuration files in the `train_script/` directory to ensure they match your setup and requirements.

```bash
bash run.sh
```

## 4. Authors and Contact Information
- Authors of DentVLM team:

    Zijie Meng<sup>1,2</sup>, Jin Hao<sup>3</sup>, Xiwei Dai<sup>1,2</sup>, Yang Feng<sup>4</sup>, Jiaxiang Liu<sup>2</sup>, Bin Feng<sup>1</sup>, Huikai Wu<sup>4</sup>, Xiaotang Gai<sup>1,2</sup>, Hengchuan Zhu<sup>1,2</sup>, Tianxiang Hu<sup>1,2</sup>, Yangyang Wu<sup>2</sup>, Hongxia Xu<sup>5</sup>, Jin Li<sup>6</sup>, Jun Xiao<sup>2</sup>, Xiaoqiang Liu<sup>7</sup>, Joey Tianyi Zhou<sup>8</sup>, Fudong Zhu<sup>1</sup>, Zhihe Zhao<sup>9</sup>, Lunguo Xia<sup>3</sup>, Bing Fang<sup>3</sup>, Jimeng Sun<sup>10</sup>, Jian Wu<sup>2,5</sup>, Zuozhu Liu<sup>1,2,5</sup>

- Affiliation of authors:

    1. Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310016, Zhejiang, China.  
    2. College of Computer Science and Technology, Zhejiang University-University of Illinois Urbana-Champaign Institute, Zhejiang University, Hangzhou 310027, Zhejiang, China.  
    3. Department of Orthodontics, Shanghai Ninth People's Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, Shanghai 200011, China.  
    4. Angelalign Technology Inc., Shanghai 200082, China.  
    5. Zhejiang Key Laboratory of Medical Imaging Artificial Intelligence, Haining 314400, Zhejiang, China.  
    6. Department of Stomatology, The First Affiliated Hospital of Shenzhen University, Shenzhen Second People's Hospital, Shenzhen 518035, China.  
    7. Department of Prosthodontics, Peking University School and Hospital of Stomatology, Beijing 100081, China.  
    8. CFAR & IHPC, Agency for Science, Technology and Research, 138632, Singapore.  
    9. State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, West China Hospital of Stomatology, Sichuan University, Chengdu, China.  
    10. Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.

**For inquiries, please contact Z.L. (zuozhuliu@intl.zju.edu.cn).**